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Abstract 

Background: The study of metabolism has attracted much attention during the last years due to its relevance in 
various diseases. The advance in metabolomics platforms allows us to detect an increasing number of metabolites 
in abnormal high/low concentration in a disease phenotype. Finding a mechanistic interpretation for these 
alterations is important to understand pathophysiological processes, however it is not an easy task. The availability 
of genome scale metabolic networks and Systems Biology techniques open new avenues to address this question. 

Results: In this article we present a novel mathematical framework to find enzymes whose malfunction explains 
the accumulation/depletion of a given metabolite in a disease phenotype. Our approach is based on a recently 
introduced pathway concept termed Carbon Flux Paths (CFPs), which extends classical topological definition by 
including network stoichiometry. Using CFPs, we determine the Connectivity Curve of an altered metabolite, which 
allows us to quantify changes in its pathway structure when a certain enzyme is removed. The influence of enzyme 
removal is then ranked and used to explain the accumulation/depletion of such metabolite. For illustration, we 
center our study in the accumulation of two metabolites {L-Cystine and Homocysteine) found in high concentration 
in the brain of patients with mental disorders. Our results were discussed based on literature and found a good 
agreement with previously reported mechanisms. In addition, we hypothesize a novel role of several enzymes for 
the accumulation of these metabolites, which opens new strategies to understand the metabolic processes 
underlying these diseases. 

Conclusions: With personalized medicine on the horizon, metabolomic platforms are providing us with a vast 
amount of experimental data for a number of complex diseases. Our approach provides a novel apparatus to 
rationally investigate and understand metabolite alterations under disease phenotypes. This work contributes to the 
development of Systems Medicine, whose objective is to answer clinical questions based on theoretical methods 
and high-throughput "omics" data. 



Background 

Metabolism comprises the inter-conversion of small 
molecules (metabolites) through enzymatically catalyzed 
biochemical reactions. These metabolites play a key role 
in different cellular functions, ranging from energy pro- 
duction to biosynthesis of complex macro-molecules. 
Metabolic alterations have been reported in a number of 
multifactorial diseases [1]. In particular, their abnormal 
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role in cancer cells currently constitutes a hot topic in 
the field of molecular systems biology [2]. Several import- 
ant works have recently emphasized this feature of cancer 
cells and have thrown light on their underlying complex 
regulatory processes [3], indicating novel ways to target 
malignant tumors. Extending the study of metabolic pro- 
cesses to other diseases is essential to complete our under- 
standing of their key pathophysiological processes. 

For this purpose, it is of utmost importance to exploit 
the information given by the spearhead experimental 
technologies that directly or indirectly provide a meta- 
bolic picture of different human samples. Gene expres- 
sion analysis [4], quantitative protein measurement [5], 
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metabolomics [6] and isotope labeling experiments [7] 
are the most widespread techniques when analyzing 
metabolic processes. Their integration into different 
mathematical models, mainly based on linear and non- 
linear optimization, has already provided relevant in- 
sights into different disease phenotypes [8,9]. Continuing 
this integration task is currently a relevant area in sys- 
tems biology and medicine [10]. 

The field of metabolomics has experienced a remarkable 
advance in mass spectrometry techniques and currently 
can measure hundreds of metabolites simultaneously [11]. 
In contrast with gene and protein expression, which are 
subject to stringent regulatory processes, metabolite abun- 
dance is closer to biochemical activity and therefore easier 
to correlate with cellular phenotype [12], as summarized in 
Figure 1A. For this reason, metabolomics has become a 
powerful approach for clinical diagnostics and personalized 
medicine [13]. In addition, metabolomics data potentially 
involves rich and valuable information to understand meta- 
bolic alterations underlying a disease phenotype. However, 
the detailed mechanistic interpretation of changes in me- 
tabolite abundance is not straightforward, as they may arise 
from different sources, some of them unlikely to be related 
with the phenotype of interest. Therefore, establishing 



effective methods to provide a functional interpretation to 
metabolomics data is required. 

Assorted methods can be found in the literature to in- 
terpret metabolomics data based on curated metabolic 
pathways and networks [14]. On the one hand, kinetic 
[15] and thermodynamic [16] approaches directly relate 
metabolite concentrations with the activity of enzymes 
in a particular metabolic pathway. However, due to the 
high computational cost and intensive prior knowledge 
(kinetic constants/proteomics data) required by these 
methods, they have been directly applied to a limited num- 
ber of metabolic pathways [17]. On the other hand, several 
tools mapping identified metabolites onto pre-defined 
metabolic pathways have been developed [18]. However, as 
pre-defined pathways do not fully capture complex meta- 
bolic states, it is difficult to extract the mechanism(s) re- 
sponsible for changes in metabolite levels. To overcome 
this issue, different tools using network-based pathway 
(mathematical) definitions were introduced [19,20]. These 
methods take advantage of the plasticity and connectivity 
of genome-scale metabolic networks to provide a more 
compact functional view of metabolomics data [21]. 

In this article, we address the question as to find the 
key enzymes regarding the accumulation/ depletion of 



A 




Genomics^ 



Genotype B 



Transcription 



V 




V 



Transcriptomics 



RNA y ' c? 



) - 



* 4 % z 




Translation 



V 



V 



Protein 




Proteomics, 



Metabolism 



V 



Metabolomics/ 



V 



Metabolites CH 2 



V 



Phenotype 



OH 



o 



o. 



OH 



OH ^ 



o 



o. 



OH 



o o 

HoC o 




CELL 



n • • 

connectivity 
curve 

it 

H,C O 



d3 



CELL 



OH 



Figure 1 Schematic representation of A) different levels of complexity at molecular level; B) the purpose of our methodology. 
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identified metabolites (see Figure IB). Given that metab- 
olite abundance is recognized as the most accurate indi- 
cator of phenotype, deciphering enzymes that regulate 
such phenotype certainly constitutes a relevant question 
and links metabolite data with upstream molecular mech- 
anisms. A similar investigation was conducted by Cakir 
et al. [22], where enzymes with statistically significant co- 
ordinated changes in the abundance of surrounding me- 
tabolites were identified. Given the high inter-connectivity 
of metabolic networks, key enzymes are not necessarily 
topological neighbors of identified metabolites, as as- 
sumed in that work. Instead, enzymes at considerable dis- 
tances from identified metabolites may exert control on 
the flux of their underlying biosynthesis and degradation 
pathways and therefore alter their concentration. 

In this light, we introduce a novel theoretical concept 
termed "connectivity curve (CC)", which summarizes the 
structure of pathways consuming (producing) an identi- 
fied metabolite and their underlying distances in a 
genome-scale metabolic network. Pathway distances are 
used as a clue for their fluxes in line with several theor- 
etical works that show that shorter pathways support a 
higher flux than longer pathways [23-25]. Using CCs we 
examine whether the removal of a certain enzyme in- 
creases degradation (biosynthesis) pathway distances of 
an identified metabolite, which therefore would decrease 
its degradation (biosynthesis) fluxes and thus may lead 
to its accumulation (depletion). 

In order to determine metabolic pathways and distances 
in CCs, we used Carbon Flux Paths (CFPs), a network- 
based pathway concept recently introduced by Pey et al. 
[26]. The CFP approach searches for simple paths (in the 
graph theoretical sense) between a given pair of source/ 
target metabolites. In addition, the CFP approach ensures 
that the obtained path satisfies a relevant set of biophys- 
ical constraints, such as mass balance (usually referred to 
as steady-state condition), going beyond classical path 
finding approaches. 

Based on CCs, we rank enzymes in the network as re- 
sponsible for the accumulation (depletion) of an identified 
metabolite. To assess the performance of our approach, 
we investigated key enzymes corresponding to the accu- 
mulation of two metabolites (L-Cysteine and Homocyst- 
eine) closely related with mental disorders. 

Results and discussion 

This section is organized as follows. We first illustrate 
the concept of CCs and several parameters arising from 
them by means of a classical metabolic network re- 
presenting glucose metabolism [27]. This example is also 
used to describe the statistical validation conducted. We 
then discuss the resulting key enzymes obtained when ap- 
plied our approach to explain the accumulation of L-Cyst- 
eine and Homocysteine in mental disorders. 



Connectivity Curve (CC) approach 

Assume a metabolite is identified in significantly high 
concentration and we are concerned with finding en- 
zymes whose malfunction explains its accumulation. 
For this purpose, we introduce the connectivity curve 
(CC), which plots the number of metabolites connected 
after moving away n steps (reactions) from such identi- 
fied metabolite. In order to determine whether this 
identified metabolite is connected to other metabolites 
and their distances, we used the Carbon Flux Paths 
(CFPs) approach. The CFP algorithm searches for the 
shortest path between a pair of source and target metab- 
olites in the context of the metabolic network, which is 
here represented as a metabolite-metabolite directed 
graph. To avoid not biologically meaningful shortcuts, 
we removed arcs not involving an effective carbon ex- 
change. The use of integer linear programming allows 
us to impose further constraints. In particular, we in- 
corporate reaction stoichiometry and force paths to 
satisfy the mass balance equation and perform in 
sustained steady-state. Further details can be found in 
Methods section. 

To illustrate the concept of CCs, consider the example 
metabolic network in Figure 2 based on Schuster et al. 
[27]. This network comprises the major part of the mono- 
saccharide metabolism, involving glycolysis, pentose phos- 
phate pathway (PPP) and part of gluconeogenesis, as well 
as an input flux of Glucose (D-Glc) and three output 
fluxes of CO2, ribose-5-phosphate (R5P) and Pyruvate 
(Pyr). We assumed that Glucose-6-Phosphate (G6P) is 
identified in high concentration and therefore enzymes re- 
sponsible for its accumulation are evaluated. Based on 
CFPs, we calculated the minimum number of steps (reac- 
tions) necessary to reach any metabolite from G6P, e.g. 7 
steps are required to reach Pyr. Note that the CFP ap- 
proach is applied once for each metabolite different to 
G6P. Red solid line in Figure 3A shows the CC for G6P. 
We can observe that from G6P 2 metabolites can be 
reached in 1 step; 8 metabolites in 2 steps, etc.; finally 
reaching 17 metabolites in 7 steps. Full details can be 
found in the Additional file 1. 

We assume that distances (number of steps) provide a 
clue of the degradation fluxes (velocity) of the metabol- 
ite under study, namely shorter distances imply higher 
fluxes. A more efficient use of resources via shorter 
pathways overall allows reactions to carry higher fluxes, 
which indirectly increases the capacity to produce bio- 
mass and energy. In particular, shorter pathways reduce 
mass leaks, wasted energy and the amount of protein re- 
quired to catalyze a process, as discussed in a number of 
theoretical works [23-25]. There is also experimental 
evidence in E. coli evolution studies that the decrease in 
the number of active reaction steps and increase in 
growth rate occur simultaneously [28] . 
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Figure 2 Example metabolic network involving glycolysis and pentose phosphate pathway [27]. Reaction arcs with a white circle indicate 
the reaction is reversible. Abbreviations of enzymes: Eno, enolase; Fba, fructose 1,6-bisphosphate aldolase; Fbp, fructose 1,6-bisphosphatase; Gap, 
glyceraldehydes 3-phosphate dehydrogenase; Gnd, phosphogluconate dehydrogenase (decarboxylating); Gpm, phosphoglycerate mutase; Hex1, 
hexokinase; Pfk, 6-phosphofructokinase; Pgi, phosphoglucoisomerase; Pgk, phosphoglycerate kinase; Pgl, phosphogluconolactonase; Pyk, pyruvate 
kinase; Rpe, ribulose-phosphate 3-epimerase; Rpi, ribose 5-phosphate isomerase; Tal, transaldolase; Tktl, transketolase; Tktll, transketolase; TpiA, 
triosephosphate isomerase; Zwf, glucose 6-phosphate dehydrogenase. Abbreviation of metabolites: 1.3BPG, 3-Phospho-D-glyceroyl phosphate; 
2PG, D-Glycerate 2-phosphate; 3PG, 3-Phospho-D-glycerate; 6PG, 6-Phospho-D-gluconate; D-GIc, D-Glucose; DHAP, Dihydroxyacetone phosphate; 
Ery4P, D-Erythrose 4-phosphate; F6P, D-Fructose 6-phosphate; FP2, D-Fructose 1,6-bisphosphate; G6P, D-Glucose 6-phosphate; GAP, 
Glyceraldehyde 3-phosphate; G06P, 6-phospho-D-glucono-1,5-lactone; PEP, phosphoenolpyruvate; Pyr, pyruvate; R5P, alpha-D-Ribose 
5-phosphate; Ru5P, D-Ribulose 5-phosphate; Sed7P, sedoheptulose 7-phosphate; Xyl5P, D-Xylulose 5-phosphate. 



Based on the above, CCs can be used to evaluate 
changes in CFP distances in different scenarios. We par- 
ticularly focus here on single reaction knockouts, i.e. we 
remove one-by-one each reaction from the metabolic net- 
work and re-calculate the CC for the identified metabolite. 



To illustrate this, green dashed line in Figure 3A shows 
the CC for G6P when phosphoglucoisomerase (Pgi) was 
knocked out. We can observe now that from G6P 1 me- 
tabolite can be reached in 1 step; 2 metabolites in 2 
steps; 3 metabolites in 3 steps, etc.; finally reaching 17 
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Figure 3 Results arising from the example network in Figure 2. A) Connectivity curve for G6P in different scenarios; B) Length increasing 
parameter (ALj), disconnected metabolites and p-value for each enzyme knockout. Shaded reactions do not satisfy filtering criterion. 
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metabolites in 10 steps. It is clear that the knockout of 
Pgi causes an increase in the distances from G6P to the 
rest of the metabolic network. The effect of this knock- 
out in the set of CFPs from G6P to the rest of metabo- 
lites can be found in the Additional file 1. We repeat the 
procedure for each enzyme in the example network. 

We hypothesize that the distance increase observed in 
our CCs approach after knocking-out a particular en- 
zyme represents a decrease in the degradation flux of 
our identified metabolite, which may lead to its accumu- 
lation. As noted above, the basis is that shorter paths 
carry higher flux [23-25] and thus their blockage would 
lead to the usage of less efficient alternative pathways 
with the corresponding concentration alterations. There- 
fore, the malfunction of such enzymes may explain a sig- 
nificant increase in the concentration of the metabolite 
under study. Since we assume that the identified metab- 
olite is closely related with a certain disease phenotype, 
these enzymes may constitute regulatory key points for 
such phenotype. 

In order to quantitatively measure the increase in dis- 
tances from the metabolite under study to the rest of the 
metabolic network when enzyme /' is knocked out, we 
introduce the length increasing parameter {ALj), which 
essentially averages distance differences between the 
knockout and wild-type scenarios (see Methods section). 
Note that we may have the case that, when an enzyme is 
knocked out, a number of metabolites become discon- 
nected from the metabolite under study. In order to 
determine ALj, we only consider changes in the distance 
to metabolites that remain connected after the knock- 
out. To illustrate this consider the blue dotted line in 
Figure 3A, which represents the CC for G6P when 
triosephosphate isomerase (TpiA) was knocked out. We 
can observe that 16 metabolites are now reached from 
G6P, namely one less than in the wild-type scenario. In 
particular, dihydroxy 'acetone phosphate (DHAP) cannot 
be reached from G6P since TpiA is essential for its mass 
balance. Although we can still find routes linking G6P 
and DHAP, none of them can be mass balanced (see 
Additional file 1). 

This parameter {ALj) allows us to rank enzymes as re- 
sponsible for the accumulation of the metabolite under 
consideration. Clearly, we are interested in enzymes that 
when knocked out present a high ALp which can poten- 
tially explain the accumulation of the metabolite of 
interest. Figure 3B details ALj in our example metabolic 
network. Pgi constitutes the most relevant enzyme for the 
accumulation of G6P. Indeed, blocking Pgi eliminates the 
classical pathways for glucose consumption, but others 
are still available. This can be easily observed in the ana- 
lysis of Elementary Flux Modes (EFMs) conducted in the 
seminal work of Schuster et al. [27], namely few EFMs 
consuming G6P are left active after the knockout of Pgi 



and they involve more steps, for example, to reach Pyr. 
Note that we extract similar conclusions as when using 
the EFMs analysis because our CFP approach forces the 
mass balance constraint in the resulting paths. However, 
the use of EFMs approach for large metabolic networks is 
difficult due to combinatorial explosion [29]. We show 
below that our approach scales well even in such large 
metabolic networks. 

We are aware of the fact that other mechanisms may 
lead to an accumulation or depletion of metabolites. For 
example, up-regulation of biosynthetic pathways of the 
metabolite under study may explain its accumulation. 
Note however that this is more difficult, since an in- 
crease in pathway flux is typically achieved if all enzymes 
[30] or (at least) a considerable number of enzymes in a 
pathway [31,32] are over-expressed. For simplicity, this 
strategy has been not considered in this work. 

We would like to clarify that we have focused on the 
accumulation of identified metabolites, but the analysis 
of depleted metabolites can be easily accomplished. 
Indeed, the definition of CCs would slightly change, in- 
volving the number of metabolites reaching the metab- 
olite under consideration after moving back n steps. 
Similarly, distances in CC provide a clue as to the bio- 
synthesis fluxes of the identified metabolite. Therefore, 
when studying a depletion instead of an accumulation, 
we would search for enzymes that, when knocked-out, 
significantly increase such distances. This would de- 
crease the biosynthesis flux of the metabolite under 
study and therefore lead to its depletion. 

Other issues: filtering criterion and specificity 

As noted above, the knockout of certain enzymes may 
bring about the disconnection of pairs of metabolites. 
Clearly, if key metabolites are disconnected, important 
damage in cellular functions may be caused, even lead- 
ing to cellular death. As we assume that phenotypic 
changes in metabolite abundance are more subtle, we 
are not aiming here at enzyme knockouts producing 
radical disconnections. Instead, we search for knockouts 
impairing but not disrupting the normal functioning of 
metabolic processes. For this purpose, we only consider 
enzymes that when knocked out do not alter the con- 
nectivity between the inputs (medium metabolites) and 
outputs (excreted metabolites) of the metabolic network. 
Note that we could impose other criteria, e.g. guarantee- 
ing a particular biological function such as biomass pro- 
duction. To illustrate this, in the example in Figure 2 we 
have as inputs D-Glc and as outputs Pyr, CO2 and R5P. 
It can be easily observed that the knockout of pyruvate 
kinase (Pyk) disconnects the biosynthesis of Pyr from D- 
Glc, which violates our rule above and therefore it is not 
considered further. Instead, when TpiA is knocked out, al- 
though DHAP is disconnected from G6P, the connectivity 
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between D-Glc and Pyr, CO2 and R5P is still found and 
therefore it is a viable knockout. Rows in the table in 
Figure 3B corresponding to enzymes violating this rule are 
shaded in grey. To conduct this task, we again used the 
CFPs approach. 

For the sake of simplicity, it would be preferable to 
relate each enzyme with a unique metabolite accumula- 
tion, i.e. knocking out a certain enzyme would only lead 
to significant changes in distances from a unique metab- 
olite. To evaluate this, we introduce a parameter re- 
presenting the specificity of the actual enzyme knockout 
for the accumulation of the metabolite under study. In 
particular, we assigned a p-value for each enzyme, which 
defines the probability of finding such enzyme in an 
equal/better position in the ranking of a different 
metabolite. Details can be found in Methods section. 
Figure 3B includes the p-values for each enzyme in our 
example metabolic network. Note here that the best p- 
value that can be attained in our approach is one over 
the number of metabolites in the network under study. 
Given the reduced number of metabolites involved in 
this example network, this explains why p-values are 
not particularly low. As can be seen in the next section, 
this lack of statistical power is overcome as the network 
size increases. The lowest p-value is found for Pgi, 
which indicates that its knockout leads to the accumula- 
tion of a reduced set of metabolites, clearly including 
G6P. As noted above, we selected the most parsimoni- 
ous solution and focused on specific knockouts, i.e. 
those having a small p-value. But a high p-value might 
not be undesired for other questions, since complex dis- 
eases may potentially present alterations in the concen- 
tration of more than one metabolite. 

Case studies: metabolite accumulation in mental 
disorders 

In this sub-section we apply the approach presented 
above to rank enzymes responsible for the accumulation 
of L-Cysteine (LCystin) and Homocysteine (HCys), me- 
tabolites found in high concentration in some mental 
disorders. For this purpose, we used the human metabolic 
network Reconl [33]. As others methods from constraint- 
based modeling, the application of our approach is 
dependent on the definition of the growth medium, i.e. 
available substrates. In this work a minimal medium based 
on glucose and amino acids was used [34]. We allow a 
way out of the network for exchange (external) metabo- 
lites found in Reconl not included in the growth medium. 

As noted above, to avoid not meaningful shortcuts 
when applying the CFP method, we removed arcs not 
involving an effective carbon exchange in each reaction. 
Based on Reconl, a list of pairs of metabolites exchan- 
ging carbon atoms for each reaction was built (see 
Additional file 2). We also neglected carbon arcs 



corresponding to hub metabolites, namely CoA, CO2, 
AMP, ATP and ADP, as typically done in others works 
[35]. In addition, given their participation in a high 
number of reactions, they can cause not meaningful 
shortcuts and disrupt carbon flux along the path. Note 
however that those metabolites were not removed from 
the stoichiometric matrix and therefore they must be 
mass balanced (see Methods section). 

L-Cystine accumulation 

The accumulation of LCystin is a relevant phenotype in 
Cystinosis, which may cause different tissue failure. In 
particular, brain atrophy was observed in this condition 
[36,37]. The principal cause of this accumulation is asso- 
ciated with the malfunction of the LCystin transport 
across lysosomal membrane [38]. 

Using the approach presented above, we explore alter- 
native scenarios leading to LCystin accumulation in the 
context of brain damage. Table 1 summarizes results 
arising from our approach (full results can be found in 
Additional file 3). In particular, we present details as to 
the four top-ranked enzymes responsible for the accu- 
mulation of LCystin. In order to evaluate the perform- 
ance of our method, we discuss below the role of these 
enzymes in several mental disorders. 

The first enzyme in the ranking is cysteine oxidase 
(CYSO), also referred to as cysteine dioxigenase, which 
presents ALj-0.7. Perry et al. [39] reported a decreased 
activity of this enzyme in the brain of a Pantothenate 
kinase-associated neurodegeneration (PKAN) patient (dis- 
ease formerly known as Hallervorden-Spatz syndrome). 
This syndrome is characterized by rigidity, spasticity, dys- 
tonia and dementia among others. In that work they also 
found an accumulation of LCystin in the globus pallidus 
of the brain, precisely the same region where they mea- 
sured the decreased activity of CYSO, clearly in line with 
our hypothesis. It is relevant to note that we are aiming at 
enzyme malfunctions that could lead to LCystin accumu- 
lation and our first predicted enzyme turned out to have 
been previously related with this phenotype in the litera- 
ture. It is worth to mention that the length of CFPs be- 
tween LCystin and more than three hundred metabolites 
were affected by the removal of this enzyme. This fact 
makes infeasible to systematically arrive at a clear network 
that summarizes pathway changes for visual inspection, 

Table 1 Details of 4 top-ranked reactions responsible for 
the accumulation of LCystin 

Reaction names ALj p-value 

Cysteine oxidase 0.700348 0.0060 

Formaldehyde dehydrogenase 0.637731 0.0190 

S-formylglutathione hydralase 0.637731 0.0190 

Sulfite oxidase 0.636891 0.0060 
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which emphasizes the advantages of using (numerical) 
parametric methods as the one introduced here. 

In the second and third position appear formaldehyde de- 
hydrogenase (FALDH) and S-Formylglutathione hydrolase 
(SFGTH) both with AL ; =0.638. FALDH is currently split 
into two independent reactions: EC 4.4.1.22 Glutathione- 
dependent formaldehyde-activating enzyme, GFAE, and 
EC 1.1.1.284 S-(hydroxymethyl) glutathione dehydrogen- 
ase, ADH3 [40]. 

GFAE consumes Formaldehyde, a highly toxic metab- 
olite previously reported in several mental disorders 
[41,42]. A known mechanism reducing this toxic metab- 
olite is the formaldehyde oxidation pathway [43]. In es- 
sence, this pathway comprises the enzymes under study: 
GFAE, ADH3 and SFGTH. Thus, these three enzymes 
are of utmost importance to reduce Formaldehyde con- 
centration. In light of this, some authors proposed to 
increase the activity of these Formaldehyde-consuming 
enzymes so as to decrease brain damage [42,44]. We 
suggest that this pathway protects brain against damaging 
processes by reducing not only Formaldehyde presence 
but also LCystin concentration. Note that FALDH and 
SFGTH have a greater p-value, which indicates that they 
are not as specific to the LCystin accumulation as CYSO. 
Hence, finding effects on other metabolites in the litera- 
ture does not seem out of place. 

The last enzyme included in Table 1 is sulfite oxidase 
(SULFOX) with Z\L,=0.637. An insufficiency of this en- 
zyme causes a disease known as sulfite oxidase defi- 
ciency, characterized by neurological disorders, mental 
retardation and brain degradation [45]. Dublin et al. [46], 
measured an accumulation of LCystin in urine in a patient 
with sulfite oxidase deficiency. However, the same study 
reported a depletion (not an accumulation) of this metab- 
olite in blood. In any case, this study suggests a connec- 
tion between LCystin and sulfite oxidase deficiency. 
Interestingly, note there is no carbon flux from LCystin to 
any metabolite taking part in SULFOX, since they are 
inorganic metabolites {Sulfite, Sulfate, H 2 0 and H + ) and 
cofactors (Ferricytochrome c and Ferrocytochrome c). 
However, this reaction is important to mass balance the 
obtained paths. When it is knocked out, original short 
paths become unavailable and the average distance in- 
creases. This remark highlights the importance of balan- 
cing the paths as previously claimed in Pey et al. [26] . 

Homocysteine accumulation 

A similar study was conducted for the accumulation of 
HCys, which has been previously linked to Alzheimer's 
disease (AD) [47,48]. Full results when our approach 
was applied in this scenario can also be found in 
Additional file 3. The four top-ranked enzymes are 
shown in Table 2. A brief discussion as to the relevance 
of these enzymes in AD is presented below. 



Table 2 Details of 4 top-ranked reactions responsible for 
the accumulation of HCys 



Reaction names 


At, 


p-values 


Phosphatidylethanolamine N-methyltransferase 


0768421 


0.0010 


S-Adenosyl-L-methionine reversible transport, 
mitochondrial 


0768421 


0.0010 


Phosphatidylserine decarboxylase 


0767333 


0.0130 


Phosphatidylserine flippase 


0767333 


0.0130 



The first enzyme in the ranking is Phosphatidyletha- 
nolamine N-methyltransferase (PEMT), with ALj=0.768. 
This mitochondrial enzyme catalyzes the methylation of 
Phosphatidylethanolamine (PE) producing Phosphat- 
idylcholine (PC). This enzyme has been previously re- 
lated with AD. In Johnson and Blusztajn [49], this 
enzyme is proposed as a possible target for AD. In par- 
ticular, they suggest that activating PEMT would be 
beneficial for cholinergic neurons, since PC production 
would be promoted. A similar conclusion was achieved 
in the work of Guan et al. [50]. They localized a deficit 
of this enzyme in the frontal cortex of brain affected 
with AD, precisely one of its most affected regions. 
From a different angle, Selley [51], aims at the accumu- 
lation of S-Adenosyl-L-homocysteine (SAH) as a possible 
cause for the malfunction of this enzyme in liver for AD 
patients. Interestingly, in that work evidences are also 
found to relate the inhibition of PEMT and the accumu- 
lation of HCys, which is in line with our hypothesis. 

S-Adenosyl-L-methionine reversible transport (SLC25A26) 
is found in the second position in Table 2, with zlZ, ; =0.768. 
This enzyme transports mitochondrial SAH into cytosol 
and cytosolic S-Adenosyl-L-methionine (SAM) into mito- 
chondria. It should be pointed out that this is the only 
mechanism producing mitochondrial SAM in the network 
under study [33]. As this metabolite is required for the 
activity of PEMT (first enzyme in the ranking), SLC25A26 
is essential for PEMT, i.e. the lack of SLC25A26 inhibits 
PEMT since the latter cannot perform in sustained steady- 
state without the former. From a different perspective, note 
that the inhibition of SLC25A26 may lead to the accumula- 
tion of SAH in mitochondria since, to our knowledge, this 
enzyme is the only one consuming SAH in such compart- 
ment. Following the hypothesis presented in Selley [51], this 
would inhibit PEMT as mentioned in the previous dis- 
cussion. In summary, from two different perspectives, 
one theoretical and another experimental, we highlight 
the importance of SLC25A26 to guarantee the activity 
of PEMT, which is closely related with an accumulation 
of HCys and AD [51]. 

Next enzyme appearing in Table 2 is Phosphatidylserine 
decarboxylase (PISD), with ALj= 0.767. This mitochondrial 
enzyme decarboxylases Phosphatidylserine (Pser) produ- 
cing a molecule of PE and C0 2 . A direct effect when the 
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activity of PISD decays is the accumulation of PSer, which 
has been indicated as a molecular signature of AD pa- 
tients [52]. In addition to this, Salvador et al. [53], provides 
insights of the decreased activity of PISD during aging, 
which is known to increase the risk of suffering AD. 

The last enzyme appearing in Table 2 is phospha- 
tidylserine flippase (PSFLIP), also with ALj= 0.767. 
PSFLIP is an ATP-consuming transporter of Pser from 
the cytosol to the mitochondria. This enzyme helps to 
maintain the membrane lipid bilayer asymmetry. When 
asymmetric collapse occurs, a signaling mechanism of 
synaptosomal apoptosis is triggered [54], as it occurs in 
AD [55]. In Castegna et al. [54] it is proposed that the 
oxidative environment characteristic in AD might contain 
compounds that interfere with the activity of PSFLIP and 
this may produce the unwanted asymmetric collapse. 
Similar conclusions were presented by [56] . 

In conclusion, based on the literature, we found that the 
deficit of PEMT seems to have a direct connection with 
HCys accumulation in AD. Instead, the role of SLC25A26, 
PISD and PSFLIP is hypothetical. As SLC25A26 is essen- 
tial for the activity of PEMT, its implication in HCys accu- 
mulation seems plausible, though additional experimental 
evidence is required. With respect to PISD and PSFLIP, 
we found an indirect association with HCys concentration 
through shared AD diagnosis. As AD is a complex disease, 
this link is not particularly compelling. Therefore, experi- 
mental work is required to validate the role of PISD and 
PSFLIP in HCys accumulation. 

Conclusions 

In this work, we present a novel network-based frame- 
work to find candidate enzymes whose malfunction is 
responsible for the accumulation of a given metabolite. 
Our approach was applied to investigate the accumula- 
tion of L-Cystine (LCystin) and Homocysteine (HCys) in 
mental disorders. Results were then discussed based on 
literature and found a good agreement with previously 
reported mechanisms. In addition, we hypothesize a novel 
role of several enzymes for the accumulation of these me- 
tabolites, which opens new strategies to understand the 
metabolic processes underlying these diseases. This is il- 
lustrated, for example, with S-Adenosyl-L-methionine re- 
versible transport (SLC25A26), whose relevance for the 
accumulation of HCys, to our knowledge, has not been 
previously reported. 

Our in-silico framework is mainly founded on the 
study of Connectivity Curves (CCs) of the metabolite 
under study in different scenarios. In particular, CCs 
summarize the pathway structure from an identified me- 
tabolite and their underlying distances, which are used 
as a clue for their fluxes. For the metabolite under study, 
we evaluate changes in CCs when an enzyme / is removed 
via ALj, namely based on the logic that an increase of 



distances in its degradation (biosynthesis) pathways poten- 
tially explains its accumulation (depletion). The central 
hypothesis here is that shorter pathways carry higher flux 
than longer pathways. This assumption is supported by 
several theoretical works and it seems plausible, particu- 
larly according to the results obtained. However, the 
integration of "omics" data into our approach, especially 
proteomics and gene expression data, constitutes a future 
research direction with the aim of providing a more realis- 
tic pathway activity. 

In order to complement the ranking arising from CCs, 
we introduced a p-value for each enzyme, which is a 
quantitative parameter indicating the specificity of a par- 
ticular enzyme knockout/ malfunction to explain the me- 
tabolite alteration under consideration. For the sake of 
simplicity, we preferably focused on enzymes with higher 
specificity for the accumulation/depletion of the metab- 
olite under study. However, a high p-value might not be 
undesired for other biological questions, since complex 
diseases may potentially present alterations in the con- 
centration of more than one metabolite. This possibility 
will be explored in the future. 

In addition, in our approach we did not consider en- 
zyme knockouts disrupting key cellular metabolic func- 
tions. Despite the fact that metabolism is typically altered 
in a disease scenario, we assume that essential functions 
can be still accomplished in the absence of an enzyme. In 
other words, we are not seeking lethal knockouts but pos- 
sible malfunctions explaining the observed accumulations/ 
depletions. For this purpose, we only analyzed knockouts 
not producing disconnections between medium metabo- 
lites (substrates) and cellular excreted metabolites. Clearly, 
this criterion can be revisited and modified according to 
the biological scenario under study, e.g. forcing the pro- 
duction of a particular set of metabolites required for 
cellular growth. 

In summary, our approach involves three main ingre- 
dients: CCs and their parameters, p-value and filtering 
criterion. These ingredients share the use of Carbon Flux 
Paths (CFPs) for their determination. This pathway con- 
cept was recently introduced and goes beyond path 
finding techniques by accounting for additional bio- 
physical constraints. In order to apply the CFP to the 
human genome-scale metabolic network presented in 
Duarte et al. [33], we manually built a database indicat- 
ing input and output metabolites that exchange carbon 
atoms in each of its reactions. This is now available for 
further research. 

The effect of CFPs is particularly observed when our 
approach predicted the association of LCystin with the 
enzyme Sulfite Oxidase (SULFOX). By definition, CFPs 
involve carbon exchange in their intermediate reaction 
steps. However, there is no carbon flux from LCystin to 
any metabolite taking part in SULFOX, since they are 
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inorganic metabolites and cofactors. Interestingly, this 
reaction is important to mass balance the obtained paths 
from LCystin to the rest of metabolites. This case also il- 
lustrates the idea that enzymes at considerable distance 
from the metabolite under consideration may regulate 
its concentration. 

Given the relevance of CFPs in the performance of our 
approach, improving their accuracy is certainly relevant. 
As noted in Pey et al. [26], CFPs must still face different 
issues. In particular, guaranteeing carbon exchange be- 
tween the source and target metabolites is essential and 
this is not fully satisfied in their current format. We en- 
sure carbon exchange in each of its intermediate steps, 
but not between the source and target. In this direction, 
the release of databases incorporating atomic mapping 
of metabolites at large scale is promising [57]. 

In addition, our approach sacrificed some accuracy by 
neglecting classical regulatory mechanisms with the aim to 
extend our analysis to genome-scale metabolic networks. 
Regulatory information is certainly relevant for explaining 
changes in metabolite levels; however, it is scarce for 
genome-scale networks. The issue will require further con- 
sideration when such data becomes widely available. 

Finally, we believe our approach will be a practical tool 
to study poorly understood disease phenotypes. Extending 
its application to other diseases (obesity, cancer, dia- 
betes...) will be a major activity in the future, precisely 
with the emergence of metabolomics studies. This work 
contributes to the development of Systems Medicine, an 
emerging field aiming to provide answers to clinical ques- 
tions based on theoretical methods and high-throughput 
"omics" data. 



Methods 

Carbon Flux Paths (CFPs) 

In Pey et al. [26] , we presented details as to the mathemat- 
ical formulation of the CFP approach. Similarly to other 
graph-based methods for the analysis of metabolic net- 
works, CFPs search for simple paths that link a pair of me- 
tabolites. However, CFPs satisfy two additional properties: 
i) ability to work in sustained steady-state; and ii) carbon 
exchange in each of its intermediate steps. The CFP ap- 
proach is an integer linear program. We summarize below 
their main features. 



i=l j=lj*i 

subject to: 

c c 

;=1 i=l 



c c 

z u > a = z "fi = 0 ( 2 ) 

1=1 7=1 
C C 

^ utk = Ukj k = l,...,C;k*a,fi (3) 

j=l /=1 
C 

^w,*<l k = l,...,C (4) 

j=i 

R 

Z 5 crV r = 0 C=l,...,C (5) 

r=l 

z r <v r <Mz r r = 1, ...,R (6) 

Zx+Z^l V(A,p)eB (7) 

R 

^2 z r^Uij i= l,...,C;j= l,...,C;iV; (8) 

v r >0 r=l,...,R 

z r e{0,l} r=l,...,R 

Uy&{0, 1} i = 1, C;j = 1, C; i*j 

A CFP constitutes a steady- state (balanced) flux distri- 
bution that involves a directed path (in the graph theor- 
etical sense) between a given source (a) and target (P) 
metabolites. We first consider variables and constraints 
to obtain a directed path between a and [3. We then il- 
lustrate constraints for steady-state flux distributions. Fi- 
nally, both sets of constraints are linked. 

Assume we have a metabolic network that comprises 
R reactions and C metabolites. Let S cr be the stoichio- 
metric coefficient associated with metabolite c in reac- 
tion r. We used a (directed) graph representation of the 
network where nodes are metabolites and arcs represent 
carbon exchange between substrates and products of 
reactions. 

Binary variables Uy stand for active arcs involved in 
the directed path between a and p, namely u t j = 1 when 
an arc between metabolites i and / is active, w,y = 0 other- 
wise. By means of Equations (1) and (2) we ensure that 
one arc leaves the source and one enters the target 
metabolite; and that no arc enters the source nor leaves 
the target. For intermediate metabolites in the path, the 
number of arcs entering them must be equal to the 
number of arcs leaving them, as imposed by Equation 
(3). Equation (4) ensures that a metabolite cannot be 
revisited in the path. 

A steady-state flux distribution satisfies Equation (5). 
Note that the continuous variable v r represents the flux 
through reaction r (r=l,...,R), which by definition is non- 
negative. We also define a set of binary variables z r 
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closely related with flux variables, namely z r = 1 if v r > 0 
and z r = 0 if v r = 0. We need Equation (6) to link v r and 
z r variables. M is a big scalar constant that sets an upper 
bound for reaction fluxes. Note that we split reversible 
reactions into two single irreversible steps. Therefore, 
we need to prevent a reaction and its reverse from 
appearing together in any steady-state flux distribution. 
This is modeled in Equation (7), where the set B={(A,u)| 
reaction \ and reaction u are the reverse of each other}. 

Equation (8) is introduced to link path constraints, 
Equations (l)-(4), and flux balancing constraints, 
Equations (5) -(7), which guarantees that if an arc is se- 
lected, at least one of the reactions r with an existing arc 
(carbon exchange) between metabolites i and / (dy? = 1) 
must carry flux. This last constraint ensures that the di- 
rected path found can perform in sustained steady-state, 
as claimed in Pey et al. [26]. 

Finally, we use an objective function that minimizes 
the number of arcs in the directed path, i.e. the shortest 
path between a and (3. Further details can be found in 
Pey et al. [26]. 

Average length increasing parameter (ALj) 
The average length increasing parameter (ALj) provides 
the difference in the average number of steps of metabo- 
lites connected to the metabolite under study between a 
knockout (K) scenario of enzyme / (Lj ) and the wild- 
type (W) scenario (£. ), as observed in Equation (9). We 
detail below how to calculate this parameter. 



AL, = Lf-Lf 



D 



D 



max, W 



argmin x (|Cf ; | 
argmin x (|Cf | 



\c K J\) 





; 






r K.j 






C K J 





w 



YZT(\ c T^ K J\-\c7^c«j\)4 



\c K J\ 



(9) 
(10) 
(11) 

(12) 
(13) 



C^" 7 is the set of metabolites connected to the source 
metabolite in at most i steps when enzyme j is knocked 
out, while CY is the same value in the wild-type scenario; 



|Cf' ; '| and \Cf\ are the cardinality of sets C ( - ,; and Of, 

respectively. Cf/ is the set of metabolites connected to 
the source metabolite after moving °° steps away. In other 
words, C^J represents the full set of metabolites 
connected to the source metabolite when enzyme / is 
knocked out, being \C^J\ its cardinality. Cjf and |Cjf | 
represent the same in the wild-type scenario. 



As observed in Equation (10), d™ 3 ** is the minimum 
distance required to reach the full set of metabolites 
connected the metabolite under study when enzyme j is 
knocked out [C^J); £> m ^- w represents the same value in 
the wild-type scenario (see Equation (11)). 

L? and L. average distances among metabolites 
connected to the source metabolite in the knockout and 
wild-type scenarios, respectively. See Equations (12)- 
(13). Note here that, in order to compare both scenarios, 
we only considered those metabolites that remain con- 
nected when enzyme / is knocked out. For this reason, we 
used the intersection expression in Equation (13). 

Enzyme p-value calculation 

The p-value defines the probability of obtaining an at 
least as extreme outcome according to a predefined Null 
Hypothesis (H 0 ). Therefore, an adequate H 0 must be de- 
fined in the first place. In our particular statistical test, 
H 0 represents that the ranking of an enzyme is not de- 
termined by the accumulated metabolite under study. In 
other words, the resulting positions of the enzymes in 
the ranking derived from our approach are not specific 
of the metabolite under study. 

Based on this H 0 definition, an appropriate distribu- 
tion function should be introduced. We define a set of 
positions X t for a particular enzyme / regarding the accu- 
mulation of N different metabolites (X li X 2 ... X N ). We 
assume that all the variables X, come from the same the- 
oretical distribution F. In addition, we introduce F* w as 
the cumulative sampling distribution associated with the 
random sample set {X lt X 2 ... X N ). In essence, F* N as- 
cribes a probability equal to 1/N to each of the sample 
observations. The formal representation of F* N is presented 
in Equation (14). Note here that x represents a given pos- 
ition in the ranking of enzyme /'. 



1 - 



(14) 



Note that Ia(j) is the indicator function corresponding 
to the set A: 



iA(y) 



1 if yeA 
0 if y$A 



(15) 



Once F* N is correctly defined, the calculation of the 
p-values corresponding to a determined ranking is 
straightforward. Assuming that enzyme / appears in the 
k-th position in a particular ranking, the corresponding 
p-value is calculated as follows: 



P valued) = F* N (k) 



(16) 



In the case study Results section, we selected 1000 
random metabolites and ranked enzymes based on the 
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CC approach, i.e. 7V=1000. The position in the ranking 
for each enzyme in the different scenarios (1000 random 
metabolite accumulation) is used to build its empirical 
distribution (F'W)- This is then used to determine the p- 
value associated with the enzymes top-ranked in the me- 
tabolites under study. 

Additional files 



Additional file 1: Toy example full details. 

Additional file 2: Manually curated carbon arcs from Reconl. 

Additional file 3: Lcystin and Hcys analysis. 
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